Supplementary Material for Multilevel Clustering via Wasserstein Means
نویسندگان
چکیده
i,j ∈ Rk×k ′ + is the cost matrix, i.e. matrix of pairwise distances of elements betweenG andG′, and 〈A,B〉 = tr(AB) is the Frobenius dot-product of matrices. The optimal T ∈ Π(G,G′) in optimization problem (1) is called the optimal coupling ofG andG′, representing the optimal transport between these two measures. When k = k′, the complexity of best algorithms for finding the optimal transport is O(k log k). Currently, (Cuturi, 2013) proposed a regularized version of (1) based on Sinkhorn distance where the complexity of finding an approximation of the optimal transport is O(k). Due to its favorably fast
منابع مشابه
Multilevel Clustering via Wasserstein Means
We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a potentially large hierarchically structured corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with Wasserstein distance metrics. W...
متن کاملWasserstein k-means++ for Cloud Regime Histogram Clustering
Much work has sought to discern the different types of cloud regimes, typically via Euclidean k-means clustering of histograms. However, these methods ignore the underlying similarity structure of cloud types. Wasserstein k-means clustering is a promising candidate for utilizing this structure during clustering, but existing algorithms do not scale well and lack the quality guarantees of the Eu...
متن کاملDynamic Clustering of Histogram Data Based on Adaptive Squared Wasserstein Distances
This paper deals with clustering methods based on adaptive distances for histogram data using a dynamic clustering algorithm. Histogram data describes individuals in terms of empirical distributions. These kind of data can be considered as complex descriptions of phenomena observed on complex objects: images, groups of individuals, spatial or temporal variant data, results of queries, environme...
متن کاملSupplementary Material ofDifferentially Private Clustering in High-Dimensional Euclidean Spaces
Non-Private Clustering: There is a wide range of prior work on the problem of center-based clustering in the absence of privacy requirement. It is known that exact optimization of objective function in R is not computationally possible (Dasgupta, 2008) even for the problem of 2-means clustering. To avoid the computational obstacle, several approximation algorithms have been developed, e.g., by ...
متن کاملSupplementary Material for Bayesian Nonparametric Multilevel Clustering with Contexts
Vu Nguyen†, Dinh Phung†, XuanLong Nguyen‡, S. Venkatesh†, and Hung Bui∗ †Centre for Pattern Recognition and Data Analytics (PRaDA), Deakin University, Australia. {tvnguye,dinh.phung,svetha.venkatesh}@deakin.edu.au ‡Department of Statistics, Dept of Electrical Engineering and Computer Science University of Michigan. [email protected] ∗Laboratory for Natural Language Understanding, Nuance Commun...
متن کامل